Plagiarism Detection in Arabic Documents: Approaches, Architecture and

نویسندگان

  • Boubaker Kahloula
  • Jawad Berri
چکیده

Plagiarism detection is a sensitive field of research which has gained lot of interest in the past few years. Although plagiarism detection systems are developed to check text in a variety of languages, they perform better when they are dedicated to check a specific language as they take into account the specificity of the language which leads to better quality results. Query optimization and document reduction constitute two major processing modules which play a major role in optimizing the response time and the results quality of these systems and hence determine their efficiency and effectiveness. This paper proposes an analysis of approaches, an architecture, and a system for detecting plagiarism in Arabic documents. This analysis is particularly focused on the methods and techniques used to detect plagiarism. The proposed web-based architecture exhibits the major processing modules of a plagiarism detection system which are articulated into four layers inside a processing component. The architecture has been used to develop a plagiarism detection system for the Arabic language proposing a set of functions to the user for checking a text and analyzing the results through a well-designed graphical user interface. Subject Categories and Descriptors [H.3.1 Content Analysis and Indexing]: Linguistic processing; [I.2 Artificial Intelligencd]; Natural language interfaces: [I.2.7 Natural Language Processing]; Text Analysis; [I.2.3 Clustering]; Similarity Measures General Terms: Text Analysis, Arabic Language Processing, Similarity Detection

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Detection of Plagiarism in Arabic Documents

Many language-sensitive tools for detecting plagiarism in natural language documents have been developed, particularly for English. Languageindependent tools exist as well, but are considered restrictive as they usually do not take into account specific language features. Detecting plagiarism in Arabic documents is particularly a challenging task because of the complex linguistic structure of A...

متن کامل

Hybrid Segmentation Prototype for Arabic Text-Based Documents: Towards Plagiarism Detection

The contribution of this work relates to the field of Arabic text-based document analysis for the detection of plagiarism. This analysis will be carried out according to the triadic computation model of document similarity. The authors propose a hybrid segmentation prototype for Arabic text-based documents that links different processing steps in order to generate the similarity rate between th...

متن کامل

A New Corpus for the Evaluation of Arabic Intrinsic Plagiarism Detection

The present paper introduces the first corpus for the evaluation of Arabic intrinsic plagiarism detection. The corpus consists of 1024 artificial suspicious documents in which 2833 plagiarism cases have been inserted automatically from source documents.

متن کامل

Plagiarism Detection In Arabic Scripts Using Fuzzy Information Retrieval

The nature of Arabic language structure exposes the need for fuzzy or vague concept to reveal dishonest practices in Arabic documents. In this paper, we present a statement-based plagiarism detection approach in Arabic scripts using fuzzy-set IR model. The degree of similarity is calculated and compared to a threshold value to judge whether two statements are the same or different. Our corpus c...

متن کامل

Analyzing Similarity in Mathematical Content To Enhance the Detection of Academic Plagiarism

Despite the effort put into the detection of academic plagiarism, it continues to be a ubiquitous problem spanning all disciplines. Various tools have been developed to assist human inspectors by automatically identifying suspicious documents. However, to our knowledge currently none of these tools use mathematical content for their analysis. This is problematic, because mathematical content po...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016